Who spoke when? - automatic segmentation and clustering for determining speaker turns

نویسنده

  • S. E. Johnson
چکیده

The problem of labelling speaker turns by automatically segmenting and clustering a continuous audio stream is addressed. A new clustering scheme is presented and evaluated using a clustering e ciency score which treats both agglomerative and divisive clustering strategies equally. Results show an e ciency of 70% can be obtained on both manually and automatically derived segments on the 1996 Hub4 development data. For the task of identifying potentially unknown anchor speakers within broadcast news shows, the frame classication error rate is very important. To re ect this, a frame-based cluster e ciency is de ned and the results show a 90% frame-based e ciency can be achieved. Finally a frame-based comparison between the manually and automatically derived segment/cluster sets shows that approximately one third of the errors are introduced during segmentation and two-thirds during clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Remes Speaker - Based Segmentation and Adaptation in Automatic Speech Recognition

With proper training, automatic speech recognition works quite well when tested in conditions similar to the training conditions, but with a new speaker or a new environment the system performance often degrades. Speaker-based adaptation alters the speech recognition system to better match a specific speaker and thus improves the speech recognition results. In order to use speaker adaptation, t...

متن کامل

Robust Unsupervised Speaker Segmentation for Audio Diarization

Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...

متن کامل

Speaker Diarization - “Who Spoke When”

Speaker diarization is the process of annotating an input audio with informationthat attributes temporal regions of the audio signal to their respective sources,which may include both speech and non-speech events. For speech regions, thediarization system also specifies the locations of speaker boundaries and assignrelative speaker labels to each homogeneous segment of speech. I...

متن کامل

Automatic Prostate Cancer Segmentation Using Kinetic Analysis in Dynamic Contrast-Enhanced MRI

Background: Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) provides functional information on the microcirculation in tissues by analyzing the enhancement kinetics which can be used as biomarkers for prostate lesions detection and characterization.Objective: The purpose of this study is to investigate spatiotemporal patterns of tumors by extracting semi-quantitative as well as w...

متن کامل

Speaker diarisation for broadcast news

It is often important to be able to automatically label ‘who spoke when’ during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999